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SUMMARY 


A  central  limit  theorem  for  normalized  sums  of  random  variables  that  form  an 
autoregressive  integrated  moving  average  (ARIMA)  process  is  developed.  The  need  for  such  a 
limit  theorem  is  discussed  in  connection  with  modeling  total  compensation  costs  associated  with 
insurance  or  medical  claims. 
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1.  Introduction 

In  order  to  make  the  discussion  reasonably  self  contained,  it  is  necessary  to  introduce 
autoregressive  integrated  moving  process  and  related  concepts.  For  a  complete  development  and 
discussion  of  the  mathematics  and  applications  of  these  processes,  refer  to  Brockwell  and  Davis 
(1992),  upon  which  the  present  notation  and  discussion  is  based. 

Consider  a  fixed  probability  space  (G,  3,  P)  on  which  all  subsequent  random  variables 
will  be  defined.  A  collection  of  random  variables  (Z^,  t=0,  ±1,  ±2, ...}  is  said  to  be  a  white  noise 

process  if  EZj=0  ,  E(Z^)=a^,  for  all  t,  and  E(Z^Zp=0  for  all  s,  t  with  s^  This  is  denoted  by 

{Zj}-WN(0,g2).  If  the  Z^s  are  also  independent  and  identically  distributed,  this  is  indicated  by 
2 

{Zj}-IID(0,o  ).  {Yj,  t=0,  ±1,  ±2, ...}  is  said  to  be  an  autoregressive  moving  average  process 
with  autoregressive  order  p  and  moving  average  order  q,  denoted  by  {Yj}-ARMA(p,q),  if  Y^ 
satisfies  a  set  of  difference  equations  of  the  form  (the  (p.s  and  6.s  are  fixed  real  constants) 

for  all  integer  t,  where  {Z^}-'WN(0,a^,  and  the  polynomial  <p(z)=l-q)jZ-<p2Z^-— -9pZ^  has  no 

roots  on  the  unit  circle  {z:lzl=l}  in  the  complex  plane.  Introducing  the  back-shift  operator  B, 
where  BYj=Yj_  j,  B^Yj=Yj_j,  for  integer  j,  (1)  can  be  written  compactly  as 

9(B)Yj=e(B)Zj  (2) 

where  <p(z)=l-<pjZ-...-9pzP,  and  6(z)=l+02Z+622^'*’— definition,  EY^sO  for  all  t  The 
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process  {Y^,  t=0,  ±1,  ...}  is  said  to  be  an  ARMA(p,q)  with  mean  n  if  t=0,  ±1, ...}  is  an 

ARMA(p,q)  process.  The  condition 
(p(z)9t0  for  lzl=l 

insures  that  the  process  {Y^,  t=0,  ±1,  ...}  satisfying  (1)  is  stationary,  which  means  that  the 

autocovariance  function  7(s,t)=cov(Y  ,  Y  )  depends  only  on  It-sl,  so  that  it  can  be  expressed  as 

i  s 

YyCh)  =  cov(Y^,  (3) 


without  ambiguity.  Moreover,  if  <p(z)^  for  lzl=l,  then  the  difference  equations  (1)  have  a 
unique  solution  given  by 

where  the  series  converges  almost  surely  and  in  mean  square,  the  coefficients  {yj)  satisfy 

l\I/.|<oo, 

and  are  the  coefficients  in  the  Laurent  expansion  0(z)/<p(z)  =  valid  for  z 

satisfying  r<lzl<l/r,  for  some  re  (0,1).  In  some  applications,  it  is  desirable  to  require  that  the 
representation  (4)  have  \|fj=0  for  j<0,  so  that  Y^  is  expressed  as  a  linear  combination  of  current 

and  past  Z^s.  This  is  true  if  <p(z)^  for  Izl^l,  i.e.  all  the  roots  of  <p(»)  lie  outside  the  unit  circle  in 


the  complex  plane.  Such  a  process  is  then  called  a  causal  ARMA(p,q).  It  can  be  shown  that  as 
long  as  <p(z)^  for  lzl=l,  an  ARMA(p,q)  process  always  has  a  causal  representation.  That  is,  it  is 
always  possible  to  redefine  the  white  noise  process  and  the  polynomial  (p  so  that  the  process  is 
causal.  It  will  be  assumed  that  all  ARMA(p,q)  processes  discussed  herein  are  causal. 


ARMA  processes  are  useful  in  describing  or  approximating  a  wide  variety  of  stationary 
processes  whose  autocovariance  functions  approach  zero  as  the  lag  approaches  infinity.  A  great 
many  methods  have  been  devised  for  estimating  the  orders  p  and  q,  and  the  unknown  (p^s  and  6jS 

in  (1)  for  a  given  set  of  observations.  For  example,  sec  Box  and  Jenkins  (1970),  Brockwell  and 
Davis  (1991),  and  Priestley  (1981).  Often  however,  it  is  necessary  to  model  various  types  of 
nonstationary  processes,  for  example,  those  with  additive  components  of  trend  and  seasonality. 
Processes  containing  polynomial  trend  and/or  periodic  behavior  can  be  modeled  essentially  by 
allowing  the  autoregressive  polynomial  to  have  one  or  more  roots  on  the  unit  circle  in  the 
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complex  plane.  One  such  type  of  model  is  called  the  autoregressive  integrated  moving  average 
(or  ARIMA)  process.  These  are  particularly  useful  for  modeling  processes  without  a  seasonal 
component,  but  which  are  "explosively"  nonstationary,  as  is  the  case  when  the  series  has  a 
deterministic  or  stochastic  polynomial  trend.  ARIMA  processes  are  defined  as  follows. 

Define  the  difference  operator  V  s  (1-B),  V®=1,  and  vj=V(vj‘^)  for  j>l.  Let  d  be  a  non 
negative  integer.  The  stochastic  process  {X^,  t=l-d,  2-d,  ...,  0,  1,  2,  ...}  is  called  an  ARIMA 

(p,d,q)  process  if  V^X^=Yj  where  { Y^}  is  a  causal  ARMA(p,q)  process  with  mean  (X.  Thus,  for 

d-1  *  * 

example,  if  Xj=AQ+Ajt  +  ...+A^  jt  +  Y^  where  {Y^ }  is  a  causal  ARMA  process  and  the  A-s 
are  arbitrary  random  variables,  then  {X^}  is  an  ARIMA(p,d,q)  for  some  p  and  q.  This  follows 
easily  from  the  result  that  V^Pj=0  for  any  polynomial  p^=Aq+A  jt+...+A^t”'  of  degree  m<d. 


In  the  next  section,  the  proper  centering  and  normalization  of  to  achieve  an 

asymptotic  normal  distribution  is  studied  when  {X^.}  is  an  ARIMA(p,d,q)  process  and  the  white 

noise  process  appearing  in  (1)  and  (2)  is  actually  an  IID(0,o^)  process.  Interest  in  this 

problem  is  stimulated  by  the  modeling  of  medical  or  insurance  claims.  A  typical  model  for 
insurance  claims  is  the  so  called  compound  Poisson  process.  See  Prabhu  (1980),  for  example, 
for  extensive  discussions  of  this  model  in  the  insurance  risk  context  Here,  claims  are  generated 
according  to  a  nonhomogeneous  Poisson  process  (Np  T^O),  and  successive  claim  costs  axe 

assumed  independent  of  {Np  T^O),  and  to  form  a  sequence  of  iid  random  variables,  {Y^,  t>l }. 

Thus,  total  claim  costs  from  the  time  period  [0,T1  are  given  by 


where  a  sum  with  upper  index  0  is  defined  to  be  0.  If  E(Ny)=m(T),  m(T)Too  as  T-*^®,  and 

{Y^}-IID(^l,x^),  then  it  follows  from  elementary  limit  theory  that 
Cpm(T)^ 

'■  N(0,1)  (6) 

as  T-^,  where  N(m,  tP")  denotes  a  random  variable  that  is  normally  distributed  with  mean  m 
2 

and  variance  x  .  The  notation  X^^-*  X  as  n-x*,  means  P{Xjj:Sx}-^P{X^)  as  n-x»,  for  all  x  at 
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which  the  function  xt^P{X^}  is  continuous.  The  result  (6)  allows  the  distribution  of  total  claim 
cost  Cp  to  be  approximated  for  large  T.  This  model  is  important  to  the  insurance  industry,  since 

if  premiums  are  collected  at  a  constant  rate  p  per  unit  time,  and  the  firm  starts  initially  with  a 
cash  reserve  of  c,  then  the  quantity  c  +  pT  -  Cp  represents,  in  a  simplified  setting,  the  monetary 

reserve  of  the  insurance  company  at  time  T,  and  the  first  time  that  this  process  hits  the  value 
zero,  the  company  becomes  insolvent. 

The  model  (5)  is  also  plausible  for  describing  medical  claims  /  compensation  costs 
associated  with  accidents  or  hazardous  materials  exposure.  In  this  context,  the  model  (5)  can  be 
made  more  realistic  by  allowing  the  claim  amounts  to  be  correlated  and/or  to  have  a  trend.  Legal 
(and  other)  precedents  /  interventions  and  economic  factors  can  affect  successive  claim  costs  to 
the  extent  that  an  ARIMA(p,d,q),  with  suitable  p,  d,  and  q,  would  be  a  more  appropriate  model. 
If  this  is  the  case,  then  in  order  to  develop  limit  theorems  similar  to  (6),  it  is  necessary  to  study 
the  asymptotic  distribution  of  where  (X^)  is  an  ARIMA(p,d,q)  process.  To  facilitate 

this,  it  will  be  assumed  from  now  on  that  the  white  noise  process  {ZJ  that  appears  in  (1)  and  (2) 

is  actually  an  HD  sequence,  i.e.  that  {Zj}~IID(0,<y^). 

The  case  where  {Xj-p)={Yj}-ARMA(p,q)  is  a  special  case  of  Theorem  7.1.2  of 

Brockwell  and  Davis  (1992),  which  shows  that 

n*^^X^^j(X^-p)->N(0,v2)  as  n-^,  (7) 

where 

=  YY(0)+2i:;;LlYY(h),  YY(h)  =  covfY^,  (8) 

provided  that  v^>0.  Then  the  factor  ()i^+x^  in  (6)  becomes  (p^+v^).  Since  corresponds  to 
Yy(0).  the  asymptotic  variance  of  Cj  could  be  larger  or  smaller  than  in  the  HD  case,  depending 

on  the  values  of  the  autocovariances.  When  (X^)  is  an  ARIMA(p,d,q)  process,  then  n*^^  must 

be  replaced  by  as  the  proper  normalization  of  as  will  be  seen  in  the  next  section. 
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2.  Central  Limit  Theorem  for  a  Class  of  ARIMA  Processes 


Let  {X^,  t=l-d,  2-d, 0,  1,  2, ...}  be  an  ARIMA(p,d,q)  process,  satisfying  V‘^Xj=Y^+|j. 

2 

where  {Y^}  is  a  causal  ARMA(p,q)  pnx:ess  as  in  (1)  and  (2)  with  {Z^}-IID(0,a  ).  Define  the 

usual  operation  of  generating  factorial  polynomials  by  Tr  (k-i+1)  =  k(k-l)...(k-j+l)  for 

AAi=l 

integers  j  and  k  with  j>0,  and  Thus,  with  k  treated  as  a  variable,  it  follows  that  for 

V  (k+l)^'^^V[a+l)!]  =  k^Vj!,  (9) 


and 


The  main  result  is  the  following  theorem,  which  holds  even  for  the  case  d=0  with  the  convention 
that  summations  with  upper  limit  0  are  taken  to  be  0. 

Theorem  1.  Let  YyC*)  autocorrelation  function  of  Yj=V^Xj-|i,  and  suppose  that 
YY(0)+2X^lYY(k)  >0-  Then,  as  n-^,  the  distribution  of 

[yy(0)  X;=i[(v-kI-1)(^^]2  +  2 
converges  to  that  of  N(0,1). 


Before  developing  a  proof  of  Theorem  1,  a  few  remarks  and  a  corollary  will  help  clarify 
this  result.  First,  the  centering  and  norming  sequences  in  Theorem  1  are  chosen  to  match  the 
mean  and  variance  of  (1 1)  with  that  of  N(0,1)  for  each  not  just  in  the  limit.  This  tends  to 
make  the  normal  approximation  more  accurate.  Because  {Yj}-ARMA(p,q),  and  therefore  Y^ 

has  the  representation  (4),  it  can  be  shown  that  the  series  (8)  converges  absolutely,  and 

where  {Vj)  are  the  constants  in  the  representation  (4)  with  \(rj=0  for  j<0  by  causality.  By 

elementary  asymptotic  analysis  of  sums  of  integer  powers,  both  of  the  sums  on  v  appearing  in 
the  denominator  of  (11)  are  asymptotic  to  n'^  V(2d+1)  as  n-*«».  Finally,  the  polynomial  with 
stochastic  coefficients  that  appears  in  the  numerator  of  (1 1)  leads,  after  summation,  to  a  term  that 
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is  equal  to  ^(n+d)^‘^‘'’^V[(d+l)!]+  OpCn^^)  as  n-*'**®,  again  by  simple  asymptotic  analysis  of  sums 

of  integer  powers.  Hence,  the  centering  constant  in  (11)  can  be  modified,  and  the  normalization 
constants  simplified,  yielding  the  following  corollary. 

Corollary  1.  Under  the  conditions  of  Theorem  1,  n  ^^*^£|*_j(x^-p(t-Kl-l)^^Vd!)  ■—f 
2 

N(0,  to  )  as  n-»^,  where 

co2=(2d+l)*^d!'2[YY(0)+2i:J^=lYY^^^]* 

The  proof  of  Theorem  1  requires  several  lemmas.  Ultimately,  the  goal  is  to  express 
as  a  weighted  sum  of  the  Y^,  which  have  the  representation  Yj=^^\}tjZj  j,  and  then  to 

exploit  the  fact  that  {Zj}-IID(0,a^)  in  order  to  apply  a  classical  central  limit  theorem.  The  first 

lemma  is  fundamental  in  this  goal. 


Lemma  1.  Suppose  that  V®Xj=Yj+H,  t^l.  Then  for  t^l,  X^  can  be  expressed  as 

V  V‘‘-'rviv  V 

i!  d!  2Y=rt-v+i 


(d-1)! 


(13) 


and  for  n^l 
-.n 


^t=l  ‘  A=0  0^  (i+1)! 


(n+i)(*-*-l>  p(n+d)('^'^^>  i^fin 


y  Y 

Ar=l  n-> 


(V4d-1) 


(d) 


(14) 


(i+1)!  (d+1)!  ^=i-n-v+l  d! 

Proof:  It  may  be  assumed  without  loss  of  generality  that  p=0.  The  first  formula  follows 
from  an  induction  argument  on  d.  If  d=l,  then  for  k>l  definition,  and 

summing  this  gives  Xt“^0'^^k=l^k“^0'*’^=l^t-v+r  ^  holds  for  d=l.  Assume  (13) 
holds  for  some  d^l,  and  rewrite  it  as 

Suppose  that  V^'*‘^X^=Y^.  Then  +  V^Xq.  Substituting  t-v+1  for  t  in  this,  and 

using  (15),  the  induction  hypothesis,  it  follows  that 


^  V"Xo)(v+d-2)W-‘'/(d-l)l 


r.t-V+L 

4=1 

rt-V+L 
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by  applications  of  (9)  and  (10),  a  reversal  of  the  order  of  summation,  and  a  change  of  variable. 
This  completes  the  induction  proof  of  (13).  Relation  (14)  follows  by  summing  relation  (13)  fix)m 
t=l  to  n,  reversing  the  order  of  summations,  and  using  (9). 

Lemma  2.  Let  {Zj)-IID(0,o^),  and  for  each  n^l,  let  a^  l^t^,  be  a  sequence  of 


constants  satisfying 

max,  - '—z — as  n-*^.  (16) 

IStSn  n  2 

H=l\n 

a  Z 

Let  X,  „= - - ,  l^t^,  and  Then  as  n->«»,  S„-^N(0,1). 

tin  r„n  2-11/2  n  H=l  t,n  n  "  ^ 

^L^=l\nJ 


Proof:  ^_jE(X^^)=o^  for  all  n.  Lete>0.  Then 
H-1  V  t,n  {IXj  „!>£}/  V  1  |Zjmaxj^^aj^[o  >ej/ 


as  n->^  by  the  dominated  convergence  theorem.  Hence,  by  the  Lindeberg  -  Feller  Central  Limit 
Theorem  (Durrett  (1991),  p.  98),  the  result  follows. 


The  final  lemma  needed  in  the  proof  of  Theorem  1  is  proposition  6.3.9  from  Brockwell 
and  Davis  (1992).  A  sketch  of  its  proof  based  on  convergence  of  characteristic  functions  is 
given  there.  Here,  a  slightly  different  proof  is  presented  in  detail.  For  random  k-vectors  X^^  and 

X,  Xjj^X  as  n-^  means  that  P{X^€  A)-»P(Xe  A)  as  n-^o®  for  every  k-dimensional  Borel  set 

A  whose  boundary  dA  satisfies  P{X€dA}=0,  which  is  equivalent  to  the  condition  that 
Ef(X^)-*‘Ef(X)  as  n-^  for  every  bounded  and  continuous  real  valued  function  f  on  the 

k-dimensional  real  numbers.  This  is  in  turn  equivalent  to  Ef(X^)->Ef(X)  as  n-M*  for  all  bounded 

and  uniformly  continuous  real  valued  functions  f  on  the  k-dimensional  real  numbers. 


Lemma  3.  Let  X^,  n=l,  2, ...  and  Y^j,  j=l,  2, ...;  n=l,  2, ...  be  random  k-vectors  such 
that 

(i)  as  n-»«>  for  each  fixed  j 
nj  j 

(ii)  Yj«>Y  as  j-*-» 
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(iii)  lim-  lim  sup^^^P{IX  -Y  .l>e)=0  for  every  e>0,  where  !•!  signifies  the  usual 

j  n  ^  11  Tij 

Euclidean  norm. 

Then  X  as  n-^co. 
n 

Proof:  Let  f  be  a  bounded  uniformly  continuous  real  valued  function  defined  on  the 
k-dimensional  real  numbers.  It  is  sufficient  to  show  that  IEf(Xj^)-Ef(Y)l-»0  as  n-w.  Fix  an  E>0. 

There  exists  a  8>0  such  that  lf(x)-f(y)l^  for  any  x  and  y  satisfying  Ix-yl^.  By  the  triangle 
inequality, 

IEf(Xj^)-Ef(Y)l  ^  Elf(Xjj)-f(Y^j)l  +  IEf(Yjjj)-Ef(Yj)l  +  iEf(Yj)-Ef(Y)l. 

Denoting  an  upper  bound  of  the  function  f  by  C,  the  first  term  on  the  right  side  of  the  inequality 

is  bounded  by 

2CP(IX  -Y  .1>5}  +e. 
n  nj 

It  follows  from  (i)  that  for  any  j, 

lim  sup„  IEf(X„)-Ef(Y)l  ^  e  +2C  Um  sup„  P{ IX -Y„.l>6}+IEf(Y.)-Ef(Y)l. 

n  n  n  nj  j 

Taking  the  lim  sup.^^  on  both  sides  and  using  (ii)  establishes  the  result,  since  e>0  was 

arbitrary. 


Proof  of  Theorem  1:  By  Lemma  1,  the  remarks  following  Theorem  1,  and  the  fact  that 

if  c„-h:  and  Z„-4Z  as  n-><»o,  then  c„Z^-»cZ  as  n-*^,  it  suffices  to  show  that 
n  n  n  n 

(2d+l)^^d!j;^^jYjj_^^j(v+d-l)(^Vd! 


►N(0.1) 


(17) 


as  n 


’.  By  representation  (4)  with  \|rj=0  for  j<0  by  causality,  for  all  l  Define 

Changing  indices  of  summation  by  letting  t=n-v+l,  and  then  j=t-k,  it  follows  that 


+T” 


«  *  * 

Denote  these  last  three  double  sums  respectively  by 
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fixed  m,  Q*j„(l)  =  OpCn^^),  and  thus  n->oo.  Also,  for  large  n, 

ViirCQnj^CB))  ^  cP" n^^^^pQl\|rjl/d!)^  so  that  Var^n’^”^^^Q^^(3))  -*-0  as  n-+-oo  and  hence 
-d-1/2  *  P 

n  n-><».  Denoting  the  coefficient  of  Zj^  in  the  second  double  summation  by 

n’  summation-by-parts  formula  that 

.  _  (V^  ...  \  (n-k-m+d)^^^^  _ 


=,n  =  &i!o'^j) 


-^0(n“ 


J  1  Tl  0  0/^X1 

where  the  0(n°'  )  term  is  uniform  in  k,  0<k^,  for  fixed  n.  Thus,  Ij^-Q^k  n“^^" 

hence  (16)  is  satisfied,  and  by  Lemma  2,  and  the  fact  that  (18)  implies  that 
_2d+l 


^k=0^k  n=  - - 2  ^ 

K-u  K,n  (2d+l)d!  J"0  ^ 


(2d+l)d!"'  y=^ 

it  follows  that  for  each  fixed  m. 


<3nm= 


(2d+l)'^d!  Q* 
nm 


■°|X“oVj 


(ir=oVif^ 


and  obviously,  as  m-*-«»,  the  last  random  variable  in  (19)  converges  in  distribution  to  N(0,1).  To 


conclude  (17)  from  Lemma  3,  it  is  sufficient,  by  Chebychev’s  inequality,  to  show  that 


sup„_^_Var 


Let  D(n.d)  =  oIE^qV-I  (2d+l)‘''^dr\  Notice 

J  J 


(2d+l)>«d!£^^j(X  (V4d-l)<«/d! 


)  =  0. 


(2d+l)'«d!j;;;^jQ;“  VjZ„.^^,.j)(v+d.l)WVd!y 


n  (v+d-l)^^^^  -oo 


Xii  V  'T  n 

A=m-f-iyn.V4-l-jy 

""I  D(n.d)  / 

-2  'T®®  /(v+d-l)^^\^ 


1-Hj-k 


(v+d-l)®(v+d-j+k-l)®1  (^•=m+l'''j) 


as  n-»oo.  The  result  now  follows  by  letting  m-»-«x».  The  final  form  (1 1)  of  Theorem  1  follows  by 
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using  (13)  and  (14)  to  verify  that  (11)  has  mean  0  and  variance  1. 


Conclusions 

A  central  limit  theorem  has  been  developed  for  centered  and  normalized  sums  of  rar:Jom 
variables  that  constitute  an  ARIMA(p,d,q,)  process.  If  {X^}  is  an  ARIMA(p,d.q)  process 

satisfying  V^Xj=H+Yj  with  {Y^}-ARMA(p,q),  it  is  seen  that  the  proper  normalization  and 

centering  sequences  in  b^=n‘^'*'^^  and  a^=|i(t+d-l)^‘^Vd!. 

Among  other  applications,  this  central  limit  theorem  is  important  in  making  large  sample 

r'N 

approximations  related  to  sums  of  the  form  where  {Np  T^}  is  a  nonhomogeneous 

Poisson  process  and  {X^}  is  an  ARIMA(p,d,q)  process,  independent  of  {Np  T^).  This  model 

provides  a  realistic  representation  of  the  total  claims  cost  associated  with  medical  claims  / 
compensation  costs  associated  with  accidents  or  hazardous  materials  exposure. 
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